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MODIFIED RAILROAD WORM RED LUOFERASE CODING 
SEQUENCES 

Technical Field 

This invention is in the field of molecular biology and medicine. More 
5 specifically, it relates to modified forms of Phrixothrix hirtus (railroad worm) red 
hiciferase. The modified forms of this red hiciferase described herein are useful in a 
wide variety of applications. The present invention describes polynucleotide 
sequences, polypeptide sequences, expression cassettes, vectors, transformed cells, 
transgenic animals, and methods of use thereof. 

10 

BACKGROUND 

In certain organisms, biohiminescence (the ability to emit light) is mediated by 
the hiciferase enzyme. Photoproteins such as hiciferase have been used for more than 
a decade as biological labels to aid in the study of gene expression in cell culture or 

IS using excised tissues (Campbell, A. K. 1988. Chenriluminescence. Principles and 

applications in biology and medicine. Ellis Horwood Ltd and VCH VerlagsgeseDschaft 
mbH, Chichester, England; Hastings, J. W. (1996) Gene. 173:5-11; Money, J. D., et 
aL, (1992) J. Acquir. Immune Defic. Syndr. 5: 1195-203; Morrey, J. D., et aL, (1991) 
J Viol. 65: 5045-51.). Further, low-light imaging of internal biohiminescent signals has 

20 been used to study temporal and spatial gene regulation in relatively thin or nearly 

transparent organisms (Millar A. J., et aL, (1992) Plant Cell 4:1075-87; Stanewsky, R., 
et aL, (1997) EMBO J. 16:5006-18; Brandes C, et aL, (1996) Neuron 16:687-92). 
External detection of internal light penetrating the opaque animal tissues has been 
described (Contag, P. R., et aL, (1998) Nature Med. 4:245-7; Contag, C. H., et aL, 

25 (1997) Photochem PhotobioL 66:523-31; Contag, C. H., et aL, (1995) Mol MicrobioL 
18:593-603). 

Wild-type and modified hiciferase coding sequences have been obtained from 
lux genes (prokaryotic genes encoding a hiciferase activity) and luc genes (eukaryotic 
genes encoding a hiciferase activity), including, but not limited to, the following: B.A- 
30 Sherf and K. V. Wood, U.S. Patent No. 5,670,356, issued 23 September 1997; Kazanri, 
J., et aL, U.S. Patent No. 5,604,123, issued 18 February 1997; S. Zenno, et al, U.S. 
Patent No. 5,618,722; K.V. Wood, U.S. Patent No. 5,650,289, issued 22 July 1997; 
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K.V. Wood, U.S. Patent No. 5,641,641, issued 24 June 1997; N. Kajiyama and E. 
Nakano, U.S. Patent No. 5,229,285, issued 20 July 1993; MJ. Cormier and W.W. 
Lorenz, U.S. Patent No. 5,292,658, issued 8 March 1994; M.J. Cormier and W.W. 
Lorenz, U.S. Patent No. 5,418,155, issued 23 May 1995; de Wet, J.R., et al, Molec. 
5 Cell Biol 7:725-737, 1987; Tatsumi, HJSf., et al, Biochinu Biophys. Acta 1131:161- 
165, 1992; and Wood, KV., et al, Science 244:700-702, 1989. Eukaryotic hiciferase 
catalyzes a reaction using hiciferin as a luminescent substrate to produce light, whereas 
prokaryotic hiciferase catalyzes a reaction using an aldehyde as a luminescent substrate 
to produce light A yellow-green hiciferase with an emission peak of about 540 nm is 

10 commercially available from Promega, Madison, WI under the name pGL3. A red 
hiciferase with an emission peak of about 610 nm is described, for example, in Contag 
et aL (1998) Nat. Med 4:245-247 and Kajiyama et aL (1991) Prot Eng. 4:691-693. 

However, prior the present disclosure optimized hiciferase sequences obtained 
from Phrixothrix hirtus (railroad worm or RR) have not been described Thus, the 

15 present invention provides novel hiciferase sequences useful in molecular biological 
studies and methods and for the generation of light-producing transgenic animals. 

Summary of The Invention 

The present invention is directed to sequences encoding functional (e.g., able to 

20 mediate the production of light in the presence of an appropriate substrate, for 

example, hiciferin, under appropriate conditions) red hiciferase of Phrixothrix hirtus. 
In one aspect, the invention comprises an isolated polynucleotide having at least about 
85% sequence identity to the nucleotide sequence shown in Figure 1 (SEQ ID NO: 1) 
or fragments thereof. Preferably, the polynucleotide exhibits at least about 90% 

25 identity, more preferably 95% identity, and most preferably 98% identity to the 

nucleotide sequence shown in Figure 1 (SEQ ID NO:l). In certain embodiments, the 
isolated polynucleotide comprises a polynucleotide consisting of full-length SEQ ID 
NO: 1 . In other embodiments, the sequences of the present invention can include 
fragments of Figure 1 (SEQ ID NO:l), for example, from about 15 nucleotides up to 

30 the number of nucleotides present in the full-length sequences described herein (e.g., 
see the Sequence Listing and Figures), including all integer values foiling within the 
above-described range. For example, fragments of the polynucleotide sequences of the 
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present invention may be 30-60 nucleotides, 60-120 nucleotides, 120-240 nucleotides, 
240-480 nucleotides, 480-1000 nucleotides, 1000 to 1641 nucleotides, and all integer 
values therebetween. In one embodiment, the invention includes a polynucleotide 
sequence encoding a functional luciferase (i.e., one that is capable of mediating the 
5 production of light in the presence of the appropriate substrate under appropriate 
conditions), wherein the polynucleotide sequence comprises a fragment derived from 
SEQ ID NO: 1. Further, this aspect of the invention includes modifications of the 
polynucleotide sequence including, but not limited to, the following: codon 
optimization for expression in a selected cell type or organism (e.g., mice, Candida, or 

10 Cryptococcus); removal/modification of unwanted restriction sites; 

removal/modification of possible glycosylation sites; removal/modffication of C- 
terminal peroxisome targeting sequences; removal/modification of transcription factor 
binding sites; removal/modification of palindromes; and/or removal/modification of 
RNA folding structures. 

15 In another aspect, the invention comprises an isolated polynucleotide having at 

least about 85% sequence identity to the nucleotide sequence shown in Figure 3 (SEQ 
ID NO:3) or fragments thereof. Preferably, the polynucleotide exhibits at least about 
90% identity, more preferably 95% identity, and most preferably 98% identity to the 
nucleotide sequence shown in Figure 3 (SEQ ID NO:3). In certain embodiments, the 

20 isolated polynucleotide comprises a polynucleotide consisting of full-length SEQ ID 
NO:3. In other embodiments, the sequences of the present invention can include 
fragments of Figure 3 (SEQ ID NO:3), for example, from about 15 nucleotides up to 
the number of nucleotides present in the full-length sequences described herein (e.g., 
see the Sequence Listing and Figures), including all integer values falling within the 

25 above-described range. For example, fragments of the polynucleotide sequences of the 
present invention may be 30-60 nucleotides, 60-120 nucleotides, 120-240 nucleotides, 
240-480 nucleotides, 480-1000 nucleotides, 1000 to 1641 nucleotides, and all integer 
values therebetween. In one embodiment, the invention includes a polynucleotide 
sequence encoding a functional luciferase (Le., one that is capable of mediating the 

30 production of light in the presence of the appropriate substrate under appropriate 
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conditions), wherein the polynucleotide sequence comprises a fragment derived from 
SEQ ID NO:3. Further, this aspect of the invention includes modifications of the 
polynucleotide sequence including, but not limited to, the following: codon 
optimization for expression in a selected cell type or organism (e.g., mice, Candida, or 
5 Cryptococcus); removal/modification of unwanted restriction sites; 

removal/modification of possible glycosylation sites; removal/modification of C- 
terminal peroxisome targeting sequences; removal/modification of transcription factor 
binding sites; removal/modification of palindromes; and/or removal/modification of 
RNA folding structures. 

10 In another aspect, the invention includes expression cassettes comprising one 

or more transcriptional and/or translational control elements operably linked to any of 
the polynucleotides described herein. 

In another aspect, the invention includes a host cell or transgenic animal 
comprising any of the polynucleotides described herein. In certain embodiments, the 

15 transgenic animal is a rodent (e.g., rat or mouse). 

In yet another aspect, the invention includes a method for monitoring 
expression of a gene in a host cell, said method comprising monitoring the expression 
of luciferase in the host cell, said host cell comprising any expression cassette 
described herein. 

20 In a still further aspect, a method for monitoring expression of a gene in a 

transgenic animal, said method comprising monitoring the expression of luciferase in 
the animal, said animal comprising any expression cassette described herein is 
provided. 

In yet another aspect, the present invention comprises a polynucleotide, as 
25 described above, encoding a functional luciferase wherein the polynucleotide sequence 
is modified to optimize expression in a different, selected host system (e.g., plants, 
yeast, etc.). Further, the polynucleotide sequence may be modified to, for example, (i) 
disrupt transcriptional regulatory elements, and (ii) add or remove restriction sites. 

These and other embodiments of the present invention will be apparent to those 
30 of skill in the art in view of the teachings herein. 
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Brief Description of the Drawings 

Figure 1 presents a modified nucleotide sequence (SEQ ID NO:l) encoding a 
red railroad worm red hiciferase according to the present invention. Figure 1 also 
5 presents the corresponding amino acid coding sequence of the hiciferase (SEQ ID 
NO:2). 

Figure 2 is a comparison of the nucleotide sequence of the native railroad 
worm red hiciferase-encoding sequence (labeled RRW red LUC native; SEQ ID NO:3) 
and the modified sequence shown in Figure 1 (labeled RRW red LUC optimized; SEQ 

10 IDNO:l). Modified nucleotides are boxed and shaded The parameters for the 

alignment were as follows: FAST algorithm, ktuple=2, gap penalty=5, window size=4, 
gap opening penalty=15, gap extension penahy=6.66. 

Figure 3 presents a native nucleotide sequence (SEQ ID NO:3) encoding a red 
railroad worm red hiciferase derived from Phrixoihrix hirtus according to the present 

IS invention. Figure 3 also presents the corresponding amino acid coding sequence of the 
hiciferase (SEQ ID NO:4). 

Modes For Carrying Out The Invention 

Throughout this application, various publications, patents, and published patent 

20 applications are referred to by an identifying citation to more fully describe the state of 
the art to which this invention pertains. 

The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, cell biology and 
recombinant DNA, which are within the skill of the art. See, e.g., Sambrook, Fritsch, 

25 and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd edition 
(1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, (RM. Ausubel et 
aL eds., 1987); the series METHODS IN ENZYMOLOGY (Academic Press, Inc.); 
PCR 2: A PRACTICAL APPROACH (MJ. McPherson, B.D. Hames and G.R. Taylor 
eds., 1995); ANIMAL CELL CULTURE (RJ. Freshney. Ed, 1987); 'Transgenic 

30 Animal Technology: A Laboratory Handbook,** by Carl A. Pinkert, (Editor) First 
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Edition, Academic Press; ISBN: 0125571658; and 'Manipulating the Mouse Embryo 
: A Laboratory Manual," Brigid Hogan, et aL, ISBN: 0879693843, Publisher: Cold 
Spring Harbor Laboratory Press, Pub. Date: September 1999, Second Edition. 

As used in this specification and the appended claims, the singular forms "a," 
5 "an" and "the" include plural references unless the content clearly dictates otherwise. 
Thus, for example, reference to "a polypeptide" includes a mixture of two or more 
such agents. 

Definitions 

10 As used herein, certain terms will have specific meanings. 

The terms "nucleic acid molecule" and "polynucleotide" are used 
interchangeably to and refer to a polymeric form of nucleotides of any length, either 
deoxyribonucleo tides or ribonucleotides, or analogs thereof. Polynucleotides may 
have any three-dimensional structure, and may perform any function, known or 

15 unknown. Non-limiting examples of polynucleotides include a gene, a gene fragment, 
exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, 
cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, 
isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and 
primers. 

20 A polynucleotide is typically composed of a specific sequence of four 

nucleotide bases: adenine (A); cytosine (Q; guanine (G); and thymine (T) (uracil (U) 
for thymine (T) when the polynucleotide is RNA). Thus, the term polynucleotide 
sequence is the alphabetical representation of a polynucleotide molecule. This 
alphabetical representation can be input into databases in a computer having a central 

25 processing unit and used for bioinformatics applications such as functional genomics 
and homology searching. 

A "coding sequence" or a sequence which "encodes" a selected polypeptide, is 
a nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the 
case of mRNA) into a polypeptide, for example, in vivo when placed under the control 

30 of appropriate regulatory sequences (or "control elements"). The boundaries of the 
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coding sequence are typically determined by a start codon at the 5' (amino) terminus 
and a translation stop codon at the 3' (caiboxy) terminus. A coding sequence can 
include, but is not limited to, cDNA from viral, prokaryotic or eukaryotic mRNA, 
genomic DNA sequences from viral or prokaryotic DNA, and even synthetic DNA 

5 sequences. A transcription termination sequence may be located 3' to the coding 

sequence. Other "control elements" may also be associated with a coding sequence. A 
DNA sequence encoding a polypeptide can be optimized for expression in a selected 
cell by using the codons preferred by the selected cell to represent the DNA copy of 
the desired polypeptide coding sequence. Thus, for example railroad worm luciferase 

10 can be codon optimized to represent preferred codon usage of mammalian gene 
sequences. "Encoded by* refers to a nucleic acid sequence which codes for a 
polypeptide sequence, wherein the polypeptide sequence or a portion thereof contains 
an amino acid sequence of at least 3 to 5 amino acids, more preferably at least 8 to 10 
amino acids, and even more preferably at least 15 to 20 amino acids from a polypeptide 

IS encoded by the nucleic acid sequence. Also encompassed are polypeptide sequences 
which are immunologically identifiable with a polypeptide encoded by the sequence. 

A "transcription factor** typically refers to a protein (or polypeptide) which 
affects the transcription, and accordingly the expression, of a specified gene. A 
transcription factor may refer to a single polypeptide transcription factor, one or more 

20 polypeptides acting sequentially or in concert, or a complex of polypeptides. 

Typical "control elements** include, but are not limited to, transcription 
promoters, transcription enhancer elements, cis- acting transcription regulating 
elements (transcription regulators, e.g., a cis-acting element that affects the 
transcription of a gene, for example, a region of a promoter with which a transcription 

25 factor interacts to induce or repress expression of a gene), transcription initiation 

signals (e.g., TATA box), basal promoters, transcription termination signals, as well as 
polyadenylation sequences (located 3* to the translation stop codon), sequences for 
optimization of initiation of translation (located 5* to the coding sequence), translation 
enhancing sequences, and translation termination sequences. Transcription promoters 

30 can include, for example, inducible promoters (where expression of a polynucleotide 
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sequence operably linked to the promoter is induced by an analyte, cofactor, regulatory 
protein, etc.), repressible promoters (where expression of a polynucleotide sequence 
operably linked to the promoter is induced by an analyte, cofactor, regulatory protein, 
etc.), and constitutive promoters. 

5 Expression enhancing sequences," also referred to as "enhancer sequences" or 

"enhancers," typically refer to control elements that improve transcription or 
translation of a polynucleotide relative to the expression level in the absence of such 
control elements (for example, promoters, promoter enhancers, enhancer elements, and 
translational enhancers (e.g., Shine and Delagarno sequences)). 

10 The term "modulation" refers to both inhibition, including partial inhibition, as 

well as stimulation. Thus, for example, a compound that modulates expression of a 
reporter sequence may either inhibit that expression, either partially or completely, or 
stimulate expression of the sequence. 

"Purified polynucleotide" refers to a polynucleotide of interest or fragment 

15 thereof which is essentially free, e.g., contains less than about 50%, preferably less than 
about 70%, and more preferably less than about 90%, of the protein with which the 
polynucleotide is naturally associated. Techniques for purifying polynucleotides of 
interest are well-known in the art and include, for example, disruption of the cell 
containing the polynucleotide with a chaotropic agent and separation of the 

20 polynucleotide^) and proteins by ion-exchange chromatography, affinity 
chromatography and sedimentation according to density. 

A "heterologous sequence" typically refers to either (i) a nucleic acid sequence 
that is not normally found in the cell or organism of interest, or (ii) a nucleic acid 
sequence introduced at a genomic site wherein the nucleic acid sequence does not 

25 normally occur in nature at that site. For example, a DNA sequence encoding a 
polypeptide can be obtained from yeast and introduced into a bacterial celL In this 
case the yeast DNA sequence is heterologous" to the native DNA of the bacterial eel 
Alternatively, a promoter sequence, for example, from a Tie2 gene can be introduced 
into the genomic location of zfosB gene. In this case the Tie2 promoter sequence is 

30 heterologous" to the native /o&B genomic sequence. 
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A '•polypeptide" is used in it broadest sense to refer to a compound of two or 
more subunit amino acids, amino acid analogs, or other peptidomimetics. The subunits 
may be linked by peptide bonds or by other bonds, for exanqrte ester, ether, etc. As 
used herein, the term "amino acid" refers to either natural and/or unnatural or synthetic 
5 amino acids, including glycine and both the D or L optical isomers, and amino acid 
analogs and peptidomimetics. A peptide of three or more amino acids is commonly 
called an oligopeptide if the peptide chain is short If the peptide chain is long, the 
peptide is typically called a polypeptide or a protein. Amino acids are shown either by 
three letter or one letter abbreviations as follows: 



Amino Acid 


three Letter 


One Letter 




Abbreviation 


Abbreviation 


Alanine 


Ala 


A 


Cysteine 


Cys 


C 


Aspartic Acid 


Asp 


D 


Glutamic Acid 


Gta 


E 


Phenylalanine 


Phe 


F 


Glycine 


Gry 


G 


Histidine 


His 


H 


Isoleucine 


Be 


I 


Lysine 


Lys 


K 


Leucine 


Leu 


L 


Methionine 


Met 


M 


Asparagine 


Asn 


N 


Proline 


Pro 


P 


Ghitamine 


Gin 


Q 


Arginine 


Arg 


R 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Valine 


Val 


V 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 



"Operably linked" refers to an arrangement of elements wherein the 
components so described are configured so as to perform their usual function. Thus, a 
given promoter that is operably linked to a coding sequence (e.g., a reporter 
expression cassette) is capable of effecting the expression of the coding sequence when 
15 the proper enzymes are present The promoter or other control elements need not be 
contiguous with the coding sequence, so long as they function to direct the expression 
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thereof. For example, intervening un-translated yet transcribed sequences can be 
present between the promoter sequence and the coding sequence and the promoter 
sequence can still be considered "operably linked" to the coding sequence. 

"Recombinant" as used herein to describe a nucleic acid molecule means a 

5 polynucleotide of genomic, cDNA, semi-synthetic, or synthetic origin which, by virtue 
of its origin or manipulation: (1) is not associated with all or a portion of the 
polynucleotide with which it is associated in nature; and/or (2) is linked to a 
polynucleotide other than that to which it is linked in nature. The term "recombinant" 
as used with respect to a protein or polypeptide means a polypeptide produced by 

10 expression of a recombinant polynucleotide. "Recombinant host cells," lost cells," 
"cells," "cell lines," "cell cultures," and other such terms denoting prokaryotic 
microorganisms or eukaryotic cell lines cultured as unicellular entities, are used inter- 
changeably, and refer to cells which can be, or have been, used as recipients for 
recombinant vectors or other transfer DNA, and include the progeny of the original 

IS cell which has been transfected. It is understood that the progeny of a single parental 
cell may not necessarily be completely identical in morphology or in genomic or total 
DNA complement to the original parent, due to accidental or deliberate mutation. 
Progeny of the parental cell which are sufficiently similar to the parent to be 
characterized by the relevant property, such as the presence of a nucleotide sequence 

20 encoding a desired peptide, are included in the progeny intended by this definition, and 
are covered by the above tram. 

An "isolated polynucleotide" molecule is a nucleic acid molecule separate and 
discrete from the whole organism with which the molecule is found in nature; or a 
nucleic acid molecule devoid, in whole or part, of sequences normally associated with 

25 it in nature; or a sequence, as it exists in nature, but having heterologous sequences (as 
defined below) in association therewith. 

Techniques for determining nucleic acid and amino acid "sequence identity" 
also are known in the art. Typically, such techniques include determining the 
nucleotide sequence of the mRN A for a gene and/or determining the amino acid 

30 sequence encoded thereby, and comparing these sequences to a second nucleotide or 
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amino acid sequence. In general, 'Identity" refers to an exact nucleotide-to-nucleotide 
or amino acid-to-amino acid correspondence of two polynucleotides or polypeptide 
sequences, respectively. Two or more sequences (polynucleotide or amino acid) can 
be compared by determining their "percent identity." The percent identity of two 

5 sequences, whether nucleic acid or amino acid sequences, is the number of exact 

matches between two aligned sequences divided by the length of the shorter sequences 
and multiplied by 100. An approximate alignment for nucleic acid sequences is 
provided by the local homology algorithm of Smith and Waterman, Advances in 
A pplied Mathematics 2:482-489 (1981). This algorithm can be applied to amino acid 

10 sequences by using the scoring matrix developed by Davho ff. Atlas of Protein 

Sequences and Structure, M.O. Dayhoff ed., 5 suppL 3:353-358, National Biomedical 
Research Foundation, Washington, D.C, USA, and normalized by Gribskov, NucL 
Acids Res. 14(6):6745-6763 (1986). An exemplary implementation of this algorithm 
to determine percent identity of a sequence is provided by the Genetics Computer 

15 Group (Madison, WI) in the "BestRt" utility application. The default parameters for 
this method are described in the Wisconsin Sequence Analysis Package Program 
Manual, Version 8 (1995) (available from Genetics Computer Group, Madison, WI). 
A preferred method of establishing percent identity in the context of the present 
invention is to use the MPSRCH package of programs copyrighted by the University 

20 of Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by 
IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages the Smith- 
Waterman algorithm can be employed where default parameters are used for the 
scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a 
gap of six). From the data generated the "Match" value reflects "sequence identity." 

25 Other suitable programs for calculating the percent identity or similarity between 
sequences are generally known in the art, for example, another alignment program is 
BLAST, used with default parameters. For example, BLASTN and BLASTP can be 
used using the following default parameters: genetic code = standard; filter = none; 
strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 

30 sequences; sort by = HIGH SCORE; Databases = non-redundant, GenBank + EMBL 
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+ DDBJ + PDB + GenBank CDS translations + Swiss protein + Spupdate + PIR. 
Details of these programs can be found at the following internet address: 
http7/www.ncbLnlm.gov/cgi-bin/BLAST. 

One of skill in the art can readily determine the proper search parameters to use 

5 for a given sequence in the above programs. For example, the search parameters may 
vary based on the size of the sequence in question. Thus, for example, a representative 
embodiment of the present invention would include a polynucleotide comprising X 
contiguous nucleotides wherein (i) the X contiguous nucleotides have at least about a 
selected level of percent identity relative to Y contiguous nucleotides of one or more 

10 of the sequences described herein or fragment thereof, and (ii) for search purposes X 
equals Y, wherein Y is a selected reference polynucleotide of defined length (for 
example, a length of from IS nucleotides up to the number of nucleotides present in a 
selected full-length sequence, e.g., SEQ ID NO:l, 1641 nucleotides, including all 
integer values falling within the above-described ranges. A "fragment" of a 

15 polynucleotide refers to any length polynucleotide molecule derived from a larger 
polynucleotide described herein (Le., Y contiguous nucleotides, where X=Y as just 
described). Exemplary fragment lengths include, but are not limited to, at least about 6 
contiguous nucleotides, at least about SO contiguous nucleotides, about 100 
contiguous nucleotides, about 250 contiguous nucleotides, about 500 contiguous 

20 nucleotides, or at least about 1000 contiguous nucleotides or more, wherein such 

contiguous nucleotides are derived from a larger sequence of contiguous nucleotides. 

The purified polynucleotides and polynucleotides used in construction of 
expression cassettes of the present invention include the sequences disclosed herein as 
well as related polynucleotide sequences having sequence identity of approximately 

25 80% to 100% and integer values therebetween. Typically the percent identities 

between the sequences disclosed herein and the claimed sequences are at least about 
85-90%, preferably at least about 90-95%, more preferably at least about 95-98%, and 
most preferably at least about 98-100% sequence identity (including all integer values 
falling within these described ranges). These percent identities are, for example, 
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relative to the claimed sequences, or other sequences of the present invention, when 
the sequences of the present invention are used as the query sequence. 

M Alternatively, the degree of sequence similarity between polynucleotides can be 
determined by hybridization of polynucleotides under conditions that form stable 

5 duplexes between homologous regions, followed by digestion with single-stranded- 
specific nuclease(s), and size determination of the digested fragments. Two DNA, or 
two polypeptide sequences are "substantially homologous" to each other when the 
sequences exhibit at least about 80%- 100% or any integer value therebetween, 
preferably at least about 85%-90%, more preferably at least about 90%-95%, more 

10 preferably at least about 95%-98%, and even more preferably 98%-100% sequence 
identity over a defined length of the molecules, as determined using the methods 
above. As used herein, substantially homologous also refers to sequences showing 
complete identity to the specified DNA or polypeptide sequence. DNA sequences that 
are substantially homologous can be identified in a Southern hybridization experiment 

15 under, for example, stringent conditions, as defined for that particular system. 

Defining appropriate hybridization conditions is within the skill of the art. See, e.g., 
Sambrook et aL, supra; DNA Cloning, supra; Nucleic Acid Hybridization, supra. 

The degree of sequence identity between two nucleic acid molecules affects the 
efficiency and strength of hybridization events between such molecules. A partially 

20 identical nucleic acid sequence will at least partially inhibit a completely identical 
sequence from hybridizing to a target molecule. Inhibition of hybridization of the 
completely identical sequence can be assessed using hybridization assays that are well 
known in the art (e.g., Southern blot, Northern blot, solution hybridization, or the like, 
see Sambrook, et aL, Molecular Cloning: A Laboratory Manual, Second Edition, 

25 (1989) Cold Spring Harbor, N.Y.). Such assays can be conducted using varying 
degrees of selectivity, for example, using conditions varying from low to high 
stringency. If conditions of low stringency are employed, the absence of non-specific 
binding can be assessed using a secondary probe that lacks even a partial degree of 
sequence identity (for example, a probe having less than about 30% sequence identity 



13 



WO 03/016839 PCT/US02/26170 



with the target molecule), such that, in the absence of non-specific binding events, the 
secondary probe will not hybridize to the target. 

When utilizing a hybridization-based detection system, a nucleic acid probe is 
chosen that is complementary to a target nucleic acid sequence, and then by selection 

5 of appropriate conditions the probe and the target sequence "selectively hybridize," or 
bind, to each other to form a hybrid molecule. A nucleic acid molecule that is capable 
of hybridizing selectively to a target sequence under "moderately stringent" typically 
hybridizes under conditions that allow detection of a target nucleic acid sequence of at 
least about 10-14 nucleotides in length having at least approximate!/ 70% sequence 

10 identity with the sequence of the selected nucleic acid probe. Stringent hybridization 
conditions typically allow detection of target nucleic acid sequences of at least about 
10-14 nucleotides in length having a sequence identity of greater than about 90-95% 
with the sequence of the selected nucleic acid probe. Hybridization conditions useful 
for probe/target hybridization where the probe and target have a specific degree of 

IS sequence identity, can be determined as is known in the art (see, for example, Nucleic 
Acid Hybridization: A Practical Approach, editors B.D. Hames and S.J. Higgins, 
(1985) Oxford; Washington, DC; IRL Press). 

With respect to stringency conditions for hybridization, it is well known in the 
art that numerous equivalent conditions can be employed to establish a particular 

20 stringency by varying, for example, the following factors: the length and nature of 

probe and target sequences, base composition of the various sequences, concentrations 
of salts and other hybridization solution components, the presence or absence of 
blocking agents in the hybridization solutions (e.g., formamide, dextran sulfate, and 
polyethylene glycol), hybridization reaction temperature and time parameters, as well 

25 as, varying wash conditions. The selection of a particular set of hybridization 
conditions is selected following standard methods in the art (see, for example, 
Sambrook, et al., Molecular Clonin g: A Laboratory Manual Second Edition, (1989) 
Cold Spring Harbor, N.Y.). 

A "vector" is capable of transferring gene sequences to target cells. Typically, 

30 "vector construct," "expression vector," and "gene transfer vector," mean any nucleic 
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acid construct capable of directing the expression of a gene of interest and which can 
transfer gene sequences to target cells. Thus, the term includes cloning, and 
expression vehicles, as well as integrating vectors. 

"Nucleic acid expression vector" or "expression cassette" refers to an assembly 
5 that is capable of directing the expression of a sequence or gene of interest. The 
nucleic acid expression vector includes a promoter that is operably linked to the 
sequences or gene(s) of interest. Other control elements may be present as well. 
Expression cassettes described herein may be contained within a plasmid construct. In 
addition to the components of the expression cassette, the plasmid construct may also 

10 include a bacterial origin of replication, one or more selectable markers, a signal which 
allows the plasmid construct to exist as single-stranded DNA (e.g., a M13 origin of 
replication), a multiple cloning site, and a "mammalian" origin of replication (e.g., a 
SV40 or adenovirus origin of replication). 

An "expression cassette" comprises any nucleic acid construct capable of 

15 directing the expression of a gene/coding sequence of interest. Such cassettes can be 
constructed into a "vector," "vector construct," "expression vector," or "gene transfer 
vector," in order to transfer the expression cassette into target cells. Thus, the term 
includes cloning and expression vehicles, as well as viral vectors. 

A "light generating protein*' or 'Tight-emitting protein" is a biohiminescent or 

20 fluorescent protein capable of producing light typically in the range of 200 nm to 1100 
nm, preferably in the visible spectrum (Le., between approximately 350 nm and 800 
nm). Biohiminescent proteins produce light through a chemical reaction (typically 
requiring a substrate, energy source, and oxygen). Fluorescent proteins produce light 
through the absorption and re-emission of radiation (such as with green fluorescent 

25 protein). Examples of biohiminescent proteins include, but are not limited to, the 
following: "hiciferase," unless stated otherwise, includes procaryotic (e.g., bacterial 
lux-encoded) and eucaryotic (e.g., firefly hic-encoded) hiciferases, as well as variants 
possessing varied or altered optical properties, such as hiciferases that produce 
different colors of light (e.g., Kajiyama, N., and Nakano, B., Protein Engineering 

30 4(6):691-693 (1991)); and "photoproteins," for example, calcium activated 



15 



WO 03/016839 



PCT/US02/26170 



photoproteins (e.g., Lewis, J.C, et aL, Fresenius /. AnaL Cherru 366(6-7):760-768 
(2000)). Examples of fluorescent proteins include, but are not limited to, green, 
yellow, cyan, blue, and red fluorescent proteins (e.g., Hadjantonakis, A.K, et aL, 
Histochem. Cell Biol 115(l):49-58 (2001)). 
5 'Biohiminescent protein substrate" describes a substrate of a light-generating 

protein, e.g., hiciferase enzyme, that generates an energetically decayed substrate (e.g., 
hiciferin) and a photon of light typically with the addition of an energy source, such as 
ATP or FMNH2, and oxygen. Examples of such substrates include, but are not limited 
to, decanal in the bacterial lux system, 4,5-dihydro-2-(6-hydroxy-2-benzothiazolyI)-4- 

10 thiazolecarboxylic acid (or simply called hiciferin) in the Firefly hiciferase Que) system, 
"panaT in the biohiminescent fungus Panellus stipticus system (Tetrahedron 44:1597- 
1602, 1988) and N-iso-valeryl-3-aminopropanol in the earth worm Diplocardia longa 
system (Biochem. 15:1001-1004, 1976). In some systems, aldehyde can be used as a 
substrate for the light-generating protein. 

15 'Tight" is defined herein, unless stated otherwise, as electromagnetic radiation 

having a wavelength of between about 200 nm (e.g., for UV-Q and about 1 100 nm 
(e.g., infrared). The wavelength of visible light ranges between approximately 350 nm 
to approximately 800 nm (Le., between about 3,500 angstroms and about 8,000 
angstroms). 

20 "Animal" as used herein typically refers to a non-human mammal, including, 

without limitation, farm animals such as cattle, sheep, pigs, goats and horses; 
domestic mammals such as dogs and cats; laboratory animals including rodents such 
as mice, rats and guinea pigs; birds, including domestic, wild and game birds such as 
chickens, turkeys and other gallinaceous birds, ducks, geese, and the like. The term 
25 does not denote a particular age. Thus, both adult and newborn individuals are 
intended to be covered 

A 'transgenic animal" refers to a genetically engineered animal or offspring of 
genetically engineered animals. A transgenic animal usually contains material from at 
least one unrelated organism, such as from a virus, plant, or other animal The "non- 
30 human animals" of the invention include vertebrates such as rodents, non-human 
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primates, sheep, dogs, cows, amphibians, birds, fish, insects, reptiles, etc. The term 
"chimeric animal" is used to refer to animals in which the heterologous gene is found, 
or in which the heterologous gene is expressed in some but not all cells of the animal. 
A "gene" as used in the context of the present invention is a sequence of 

5 nucleotides in a genetic nucleic acid (chromosome, plasmid, etc.) with which a genetic 
function is associated. A gene is a hereditary unit, for example of an organism, 
comprising a polynucleotide sequence (e.g., a DNA sequence for mammals) that 
occupies a specific physical location (a "gene locus" or "genetic locus") within the 
genome of an organism. A gene can encode an expressed product, such as a 

10 polypeptide or a polynucleotide (e.g., tRNA). Alternatively, a gene may define a 

genomic location for a particular event/function, such as the binding of proteins and/or 
nucleic acids (e.g., phage attachment sites), wherein the gene does not encode an 
expressed product. Typically, a gene includes coding sequences, such as, polypeptide 
encoding sequences, and non-coding sequences, such as, promoter sequences, poly- 

IS adenlyation sequences, transcriptional regulatory sequences (e.g., enhancer sequences). 
Many eucaryotic genes have "exons" (coding sequences) interrupted by "mtrons" 
(non-coding sequences). In certain cases, a gene may share sequences with another 
gene(s) (e.g., overlapping genes). It is noted that in the general population, wild-type 
genes may include multiple prevalent versions that contain alterations in sequence 

20 relative to each other and yet do not cause a discernible pathological effect. These 
variations are designated "polymorphisms" or "allelic variations." 

Before describing the present invention in detail, it is to be understood that this 
invention is not limited to particular formulations or method parameters as such may, 
of course, vary. It is also to be understood that the terminology used herein is for the 

25 purpose of describing particular embodiments of the invention only, and is not intended 
to be limiting. 

Although a number of methods and materials similar or equivalent to those 
described herein can be used in the practice of the present invention, the preferred 
materials and methods are described herein. 
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General Overview 

Described herein are native and modified forms of railroard worm red 
luciferase. The native coding sequence was derived from Phrixothrix hirtus. The 
present invention is directed to sequences encoding functional (e.g., able to mediate 
5 the production of light under appropriate conditions) red luciferase of Phrixothrix 
hirtus. Native polynucleotide and polypeptide red luciferase sequences (SEQ ID NO:3 
and SEQ ID NO:4, respectively), as well as modified, optimized polynucleotide and 
polypeptide sequences (SEQ ID NO: 1 and SEQ ID NO:2, respectively) are taught 
herein. In one aspect, the invention comprises an isolated polynucleotide or 

10 polypeptide having at least about 85% sequence identity to the sequences shown in 
Figure 1 (SEQ ID NO: 1 and SEQ ID NO:2) or fragments thereof. In another aspect, 
the invention comprises an isolated polynucleotide or polypeptide having at least about 
85% sequence identity to the sequences shown in Figure 3 (SEQ ID NO:3 and SEQ ID 
NO:4) or fragments thereof. Preferably, the sequences exhibit at least about 90% 

15 sequence identity, more preferably 95% sequence identity, and most preferably 98% 
sequence identity to the sequences described herein. In certain embodiments, the 
isolated polynucleotide sequence comprises a polynucleotide consisting of full-length 
SEQ ID NO: 1 and/or SEQ ID NO:3. In certain embodiments, the isolated polypeptide 
sequence comprises a polypeptide consisting of full-length SEQ ID NO:2 and/or SEQ 

20 ID NO:4. In other embodiments, the sequences of the present invention can include 
fragments of the polynucleotides described herein, for example, from about 15 
nucleotides up to the number of nucleotides present in the full-length sequences 
described herein (e.g., see the Sequence Listing and Figures), including all integer 
values falling within the above-described range. For example, fragments of the 

25 polynucleotide sequences of the present invention may be 30-60 nucleotides, 60-120 
nucleotides, 120-240 nucleotides, 240-480 nucleotides, 480-1000 nucleotides, 1000 to 
1641 nucleotides, and all integer values therebetween. In one embodiment, the 
invention includes a polynucleotide sequence encoding a functional luciferase (ie., one 
that is capable of mediating the production of light, for example, in the presence of the 

30 appropriate substrate under appropriate conditions), wherein the polynucleotide 
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sequence comprises a fragment. Further, this aspect of the invention includes 
modifications of the polynucleotide sequences encoding polypeptide sequences 
including, but not limited to, the following: codon optimization for expression in a 
selected cell type or organism (for example, human, rodent (e.g., mouse), Candida, or 
5 Crypt ococcus); removal/modification of unwanted restriction sites; 

removal/modification of possible glycosylation sites; removal/modification of C- 
terminal peroxisome targeting sequences; removal/modification of transcription factor 
binding sites; removal/modification of palindromes; and/or removal/modification of 
RNA folding structures. The invention also includes polypeptides encoded by the 

10 above-described polynucleotides or fragments thereof. 

Unlike the most widely studied and modified hiciferase gene, which is derived 
from the firefly Photinus pyralis, modifications of RR red hiciferase have not 
heretofore been described These novel sequences are useful in a wide variety of 
applications, including all applications where hiciferase is used as a reporter gene. 

15 Advantages of the present invention include, but are not limited, to (1) increasing 
expression of RR red hiciferase in host cells (in vivo and in vitro), for instance by 
optimizing codon usage to reflect that of the host cell; (2) obtaining expression of RR 
red hiciferase that is unbiased by peroxisomal physiology; (3) obtaining a reporter gene 
that is genetically neutral in that it contains no major genetic regulatory elements, 

20 palindromic sequences and/or RNA structures (e.g., hairpins) that interfere with 

expression; and (4) obtaining a hiciferase that provides reliability and convenience in 
diverse applications. 



Isolation and Sequencing of the Native Railroad Worm Red Luciferase 
25 Originally the starting sequence for optimization was the sequence presented as 

GENBANK Accession No. AF139645, which was based on the sequence of a cloned 
cDNA molecule (Hire, described in Viviani, V.R., et aL, Biochemistry 35:8271-8279, 
1999). The originally optimized sequence was designated RRLUCX However, the 
RRLUCX sequence did not encode a polypeptide that produced light. The original 
30 clone (Phto) was independently sequenced and several sequence errors were 
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discovered relative to the AF139645 sequence. The correct sequence of the original 
clone is presented in the top line of Rgure 2 (SEQ ID NO:3) and in Figure 3. 

Modifications to Railroad Worm Red Lucif erase 
5 To improve the general suitability of hiciferase in molecular biological 

applications, a modified form of the hiciferase gene from the Phrixothrix hirtus 
(railroad worm or RR) has been developed. The Phrixothrix hirtus larva produces 
both a green and red hiciferase (see, Viviani et aL (1999) Biochemistry 38(26):8271- 
8279). 

10 A railroad worm red hiciferase was modified to optimize expression in 

mammalian cells., An exemplary modified luciferase-encoding sequence is shown in 
Rgure 1 (SEQ ID NO:l) and Rgure 2 (RRW red LUC optimized). An polypeptide 
translation of SEQ ID NO:l is also presented in Rgure 1. This modified hiciferase was 
obtained using one or more of the following procedures: (a) codon optimization to 

15 match usage in mammalian genes, preferably without changing the amino acid 

sequence of the protein; (b) removal of unwanted restriction enzyme sites, preferably 
without changing the amino acid sequence; (c) removal of peroxisome targeting 
sequence (SKL) at the end of the protein; (d) removal of as many as possible putative 
transcription factor binding sites; (e) removal of palindromes and repeats in the DNA 

20 sequence; and (f) checking the mRNA for secondary structure problems (e.g., large 
hairpins, etc.). In addition, the sequence can be modified to remove possible 
glycosylation sites (e.g., Asn-X-Ser/Thr). 

The sequence to be modified can be any railroad worm luciferase-encoding 
sequence, for example the sequence shown in Rgure 2, labeled RRW red LUC native. 

25 A preferred method of site-specifically mutating the starting sequence (e.g., any 

railroad worm red luciferase-encoding sequence) is by using PCR. General procedures 
for PCR as taught in MacPherson et aL , PCR: A Practical Approach, (IRL Press at 
Oxford University Press, (1991)). PCR conditions for each application reaction may 
be empirically determined. A number of parameters influence the success of a reaction. 

30 Among these parameters are annealing temperature and time, extension time, Mg2+ 

20 
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and ATP concentration, pH, and the relative concentration of primers, templates and 
deoxyribonucleo tides . After amplification, the resulting fragments can be detected by 
agarose gel electrophoresis followed by visualization with ethidhim bromide staining 
and ultraviolet illumination. 

5 Site-specific mutagenesis can also be performed using techniques known in the 

art, for example using the QuikChange® kit (Stratagene, La Jolla, CA) and following 
the manufacturer's directions. Site-directed mutagenesis against single-stranded 
plasmid templates is described for example in Lewis et aL (1990) Nuc. Acids Res. 
18:3439-3443. According to this method, a mutagenic primer designed to correct a 

10 defective ampiciUin resistance gene is used in combination with one or more primers 
designed to mutate discreet regions within the target gene. Rescued antibiotic 
resistance coupled with distant non-selectable mutations in the target gene results in 
high frequency capture of the desired mutations. 

Another method for obtaining optimized railroad worm red hiciferase is 

15 random mutagenesis to randomly alter the amino acids, followed by screening for 
clones exhibiting efficient luminescence. Random mutagenesis can be performed, for 
example, by generating oligonucleotide^) to randomly alter the target DNA sequence, 
for example the peroxisome targeting sequence (SKL) at the C-terminus of hiciferase. 
DNA containing a population of random C-terminal mutations is used to transform E. 

20 coli cells and ampiciUin resistant colonies can be screen for bioluminescence by any 

method known in the art. Those clones selected for high hiciferase expression can then 
be sequenced and otherwise analyzed for amino acid sequence deviation from the 
natural peroxisome targeting sequence. 

25 1. Codon Optimization 

Codon optimization can be achieved, for example, by utilizing the Codon 
Usage Database, available on the World Wide Web at http://ww.kazusa.or.jp/codon/. 
Codon usage tables were generated from human, mouse, Candida and Cryptococcus 
coding sequences. This database was generated using the coding sequences located in 

30 Genbank. Comparing mouse and human codon usage, they are almost identical, 
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varying by <5% for each codoiL Therefore, the construct made should work in both 
organisms. The Cryptococcus codon use is similar (<10%) to that of mammalian cells 
for about three quarters (75%) of the amino acids. In Candida, the codon usage is 
generally the opposite of that the other organisms and, therefore, the construct would 

5 have to be made for optimal codon usage. 

Using a codon usage chart for human genes, the RR red hiciferase was 
modified so as to bring the codons close to the percentages used in mammals. Table 1 
shows the original number of amino acid residues (column: Amino Acid) and codons 
used (column: Codon) present in the native protein (column: orig #), and in the 

10 modified, optimized sequence (column: new#). Also, the percent of each different 
codon used for each given amino acid is presented for the native sequence (column: 
orig %), and the modified, optimized sequence (column: new %). Further, the percent 
of each different codon used for each given amino acid is presented for typical coding 
sequences in human genes (column: % in human genes), mouse genes (column: % in 

IS mice), Candida genes (column: % in Candida), and Cryptococcus genes (column: % 
in Crypto). 
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In preparing modified raflpad^wonn hiciferase, it is preferable to change all of 
the leucine codons tq^fG^ks QVGjL the most used leucine codon in mammalian 
cells. However, less than all of these codons can also be changed Furthermore, 
leucine (or other codons) can also be changed to other codons to remove restriction 
sites and transcription factor binding sites. 



2. Removal of Unwanted Restriction Enzyme Sites 

The restriction enzyme sites in the RR red gene can be mapped to identify 
and/or remove unwanted restriction enzyme sites. Such modifications can be done 

10 prior to, after or independent of the other modifications described herein (codon 

optimization, etc.). In one embodiment described herein, a single Sma I, and two Pst I 
sites were located in the gene following codon optimization. One of the PstI sites was 
introduced during codon optimization. Accordingly, nucleotides 69 and 1002 of SEQ 
ID NO:l were modified to disrupt the two PstI sites, and nucleotide 1614 of SEQ ID 

15 NO: 1 was modified to disrupt the Sma I site, each without changing the amino acid 
sequence. 

For ease in cloning, restriction sites are preferably added to the 5 1 and 3 1 end of 
the hiciferase-encoding sequence. Preferably, these restriction sites are unique. If, 
however, the added restriction sites are also found internally, the internal site can be 
20 modified without affecting the amino acid sequence. For example, if the nucleotides 
CC are added immediately before the start codon (at the 5' end), aNcol site is created 
(CCATGG). Such internal sites may be undesirable and can be readily modified 
following the teachings described herein (e.g., nucleotide 990 of SEQ ID NO:l was 
modified to removal an internal Ncol site). 

25 

3. RenK>val of Possible Glvcosvlation Sites 

Native hiciferase expressed in the peroxisomes or the cytosol is not typically 
post-translationaDy modified. However, in certain applications, for example 
applications in which the modified hiciferase is used as part of a fusion protein and is 
30 excreted, the resulting polypeptide may be directed into the endoplasmic reticulum or 
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Golgi apparatus where post-translational modification such as N-linked glycosylation 
are known to occur. Because such post-translational modifications may affect 
hiciferase egression, it may be desirable in these instances to remove possible 
glycosylation sites, 

5 There are two possible glycosylation sites in RR red (Asn-X-Ser/Thr). They 

are both N-I-S sites and are located at amino acids 1 16-1 18 (nucleotides 347-355) and 
461-463 (nucleotides 138 1-1389). None, one or both of these sites may be altered, for 
example, by modifying the asparagine (aa 461) to aspartic acid. 

10 4. Removal of C-terminal Peroxisome Targeting Sequence 

A major concern in the use of the native hiciferases as genetic reporters is 
potential intracellular partitioning into peroxisomes. The presence of this foreign 
protein in peroxisomes, and moreover, the resulting competition with native host 
proteins for peroxisomal transport has undefined affects on the normal cellular 

15 physiology. Variable subcellular localization of hiciferase also compromises its value as 
a quantitative marker of gene activity. These potential problems reduce the general 
reliability of hiciferase in reporter applications. Thus, it may be desirable to remove or 
render non-functional the peroxisome targeting sequence. 

In RR red hiciferase, a peroxisome targeting sequence (Ser-Lys-Leu) is located 

20 at the end of the gene. In certain aspects, this sequence is changed to encode He- Ala- 
Val by modifying native nucleotides 1630 through 1637 of SEQ ID NO:3 from 
TCAAAAT to ATCGCTG. 

5. Removal of Transcription Factor Binding Sites 
25 Any gene may contain regulatory sequences within its coding region which 

could mediate genetic activity through native regulatory function or via recognition by 
transcription factors in a foreign host These sequences may alter expression of 
hiciferase and were, therefore, altered while keeping the codon usage optimal and 
without affecting the amino acid sequence. 
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A table of 3 12 transcription factor binding sites is available in the program 
MacDNASIS. The RRhic sequence was analyzed for these sites and as many as 
possible were removed, 

5 6. Removal of Palindromes 

Palindromic sequences can affect expression. Using web-based programs, the 
gene sequence was searched for inverted repeats, tandem repeats, and palindromes. 
No inverted or tandem repeats of significant size were found. No perfect palindromes 
of over 9 bp were found and only one palindrome of 10 bp and one of 9 bp were found 
10 when one mismatch was allowed. These sequences were not altered 

Subsequently, using a web based program, the sequence was searched for DNA 
sequences repeated in the genome of primates (e.g., Ahx sequences), rodents or other 
mammals. None were found. 



15 7. RNA folding structures 

Using the mfold3.0 program located at the Macfarlane Burnet Center in 
Australia (http://mfold.burnet.edu.au), several RNA folding structures were plotted. 
Upon inspection of the hairpins or base paired regions plotted, there were no large 
regions (>6 bases) of Gs and Cs in the base paired regions. They were either evenly 

20 divided between G-C and A-U pairs or mostly A-U pairs. 

8. Summary of Modifications to the RRLUCX Sequence 
As discussed above, the original starting sequence for optimization was the 
sequence presented as GENBANK Accession No. AF139645, which was based on the 
25 sequence of a cloned cDNA molecule (PhRE, described in Viviani, V.R., et al., 

Biochemistry 38:8271-8279, 1999). The originally optimized sequence was designated 
RRLUCX However, the RRLUCX sequence did not produce light. 

Table 2 is a summary of the nucleic acid modifications made to the RRLUCX 
sequence in order to obtain the optimized, modified Red Railroad Worm hiciferase 
30 sequence (labeled "RRLUCXC in Table 2, and "RRW red LUC optimized" in Figure 
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2). The nucleotide (SEQ ID NO: 1) and protein (SEQ ID NO:2) sequences of the 
RRW red LUC modified, optimized sequence are presented in Figure 1. Figure 2 
presents a nucleotide sequence comparison between the native Red Railroad Worm 
luciferase (SEQ ID NO:3) and the RRW red LUC optimized sequence (SEQ ID 
5 NO:l). 

The RRW red LUC optimized (RRLUCXQ sequence was completely 
functional when expressed in host cells and produced a light of JW* approximately 622 

DHL 
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Applications 

The railroad worm red hiciferase sequences described herein find use in a wide 
variety of procedures and applications. The native, native-modified, optimized, and/or 
modified-optimized red hiciferases can, for example, be employed as described herein 
5 below. 

The isolated polynucleotides of the present invention may be incorporated into 
expression cassettes. The expression cassettes described herein may typically include 
the following components: (1) a polynucleotide comprising a first polynucleotide, for 
exanq>le, having at least about 85-100% sequence identity to SEQ ID NO: 1 or SEQ 

10 ID NO:3, wherein said first polynucleotide encodes a polypeptide capable of mediating 
light-production in the presence of an appropriate substrate, e.g., luciferin, under 
appropriate conditions, (2) a transcription control element operably linked to the 
polynucleotide, wherein the control element is heterologous to the coding sequences of 
the light generating protein. Transcription control elements may be associated with, 

15 for example, a basal transcription promoter to confer regulation provided by such 
control elements on such a basal transcription promoter. 

The present invention also includes providing such expression cassettes in 
vectors, comprising, for example, a suitable vector backbone and optionally a sequence 
encoding a selection marker e.g., a positive or negative selection marker. Vectors 

20 carrying sequences encoding a red hiciferase of the present invention, encoding fusions 
of a red hiciferase and one or more additional polypeptides, or comprising further 
coding sequences can be constructed. The vectors carrying a red hiciferase can be 
constructed utilizing methodologies known in the art of molecular biology (see, for 
example, Ausubel or Maniatis supra) in view of the teachings of the specification. For 

25 example, a vector may be constructed by inserting, into a suitable vector backbone, 
polynucleotides encoding a red hiciferase, operably linked to a promoter of interest. 
Suitable vector backbones may comprise an Fl origin of replication; a colEl plasmid- 
derived origin of replication; polyadenylation sequence(s); sequences encoding 
antibiotic resistance {e.g., ampiciflin resistance) and other regulatory or control 

30 elements. Non-limiting examples of appropriate backbones include: pBhiescriptSK 
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(Stratagene, La JoHa, CA); pBhiescrqrtKS (Stratagene, La JoHa, CA) and other 
commercially available vectors. Such a backbone vector may be chosen based on the 
cell type into which the construct is going to be introduced (e.g., bacterial cells, 
eucaryotic cells (e.g., plant cells, animal cells, fungal cells, insect cells, etc.)). The 
5 constructs may also contain additional reporter molecules (e.g., positive or negative 
selection markers). 

A variety of other reporter genes may be used in the practice of the present 
invention. Preferred are those that produce a protein product which is easily measured 
in a routine assay. Suitable reporter genes include, but are not limited to 

10 chloramphenicol acetyl transferase (CAT), other light generating proteins (e.g., 
biohiminescent or fluorescent polypeptides), and beta-galactosidase. Convenient 
assays include, but are not limited to calorimetric, fluorimetric and enzymatic assays. In 
one aspect, reporter genes may be enployed that are expressed within the cell and 
whose extracellular products are directly measured in the intracellular medium, or in an 

15 extract of the intracellular medium of a cultured cell line. This provides advantages 
over using a reporter gene whose product is secreted, since the rate and efficiency of 
the secretion introduces additional variables that may complicate interpretation of the 
assay. 

Positive selection markers include any gene which a product that can be readily 
20 assayed. Examples include, but are not limited to, an HPRT gene (Littlefield, J. W., 
Science 145:709-710 (1964)), a xanthine-guanine phosphoribosyltransferase (GPT) 
gene, or an adenosine phosphoribosyltransferase (APRT) gene (Samhrook et aL, 
supra), a thymidine kinase gene (Le. TK") and especially the TK gene of the herpes 
simplex virus (Giphart-Gassler, M. et aL, Mutat. Res. 214:223-232 (1989)), a nptn 
25 gene (Thomas, K. R. et aL, Cell 51:503-512 (1987); Mansour, S. L. et aL, Nature 
336:348-352 (1988)), or other genes which confer resistance to amino arid or 
nucleoside analogues, or antibiotics, etc., for example, gene sequences which encode 
enzymes such as dihydrofolate reductase (DHFR) enzyme, adenosine deaminase 
(ADA), asparagine synthetase (AS), hygromycin B phosphotransferase, or a CAD 
30 enzyme (carbamyl phosphate synthetase, aspartate transcarbamylase, and 
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dihydroorotase). Addition of the appropriate substrate of the positive selection marker 
can be used to determine if the product of the positive selection marker is expressed, 
for example cells which do not express the positive selection marker nptH, are killed 
when exposed to the substrate G418 (Gibco BRL Life Technology, Gaithersburg, 
5 MD). 

The vector typically contains insertion sites for inserting other polynucleotide 
sequences of interest. These insertion sites are preferably included such that there are 
two sites, one site on either side of the sequences encoding the positive selection 
marker, hiciferase and the promoter. Insertion sites are, for example, restriction 

10 endonuclease recognition sites, and can, for example, represent unique restriction sites. 
In this way, the vector can be digested with the appropriate enzymes and the sequences 
of interest ligated into the vector. 

Optionally, the vector construct can contain a polynucleotide encoding a 
negative selection marker. Suitable negative selection markers include, but are not 

15 limited to, HSV-tk (see, e.g., Majzoub et aL (1996) New Engl J. Med. 3.34:904-907 
and U.S. Patent No. 5,464,764), as well as genes encoding various toxins including the 
diphtheria toxin, the tetanus toxin, the cholera toxin and the pertussis toxin. A further 
negative selection marker gene is the hypoxanthine-guanine phosphoribosyl transferase 
(HPRT) gene for negative selection in 6-thioguanine. 

20 The vectors described herein can be constructed utilizing methodologies known 

in the art of molecular biology (see, for example, Ausubel or Maniatis) in view of the 
teachings of the specification. As described above, the vector constructs containing 
the expression cassettes are assembled by inserting the desired components into a 
suitable vector backbone, for example: a vector comprising (1) a first polynucleotide 

25 having at least about 85% sequence identity to SEQ ID NO: 1, wherein said first 
polynucleotide encodes a polypeptide capable of mediating light-production in the 
presence of an appropriate substrate, e.g., hiciferin, under appropriate conditions, 
operably linked to a transcription control elements) of interest suitable to provide 
expression in a selected host cell; (2) a sequence encoding a positive selection marker; 

30 and, optionally (3) a sequence encoding a negative selection marker. In addition, the 
vector construct contains insertion sites such that additional sequences of interest can 
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be readily inserted to flank the sequence encoding positive selection marker and 
hiciferase-encoding sequence. 

A preferred method of obtaining polynucleotides, suitable regulatory sequences 
(eg., promoters) is PCR. General procedures for PCR as taught in MacPherson et aL, 
5 PCR: A Practical Approach, (IRL Press at Oxford University Press, (1991)). PCR 
conditions for each application reaction may be empirically determined A number of 
parameters influence the success of a reaction. Among these parameters are annealing 
temperature and time, extension time, Mg2+ and ATP concentration, pH, and the 
relative concentration of primers, templates and deoxyribonucleotides. After 

10 amplification, the resulting fragments can be detected by agarose gel electrophoresis 
followed by visualization with ethidium bromide staining and ultraviolet illumination. 

In one embodiment, PCR can be used to amplify fragments from genomic 
libraries. Many genomic libraries are commercially available. Alternatively, libraries 
can be produced by any method known in the art Preferably, the organism(s) from 

15 which the DNA is has no discernible disease or phenotypic effects. This isolated DNA 
may be obtained from any cell source or body fluid (e.g., ES cells, liver, kidney, blood 
cells, buccal cells, cerviovagmal cells, epithelial cells from urine, fetal cells, or any cells 
present in tissue obtained by biopsy, urine, blood, cerebrospinal fluid (CSF), and tissue 
exudates at the site of infection or inflammation). DNA is extracted from the cells or 

20 body fluid using known methods of cell lysis and DNA purification. The purified DNA 
is then introduced into a suitable expression system, for example a lambda phage. 
Another method for obtaining polynucleotides, for example, short, random nucleotide 
sequences, is by enzymatic digestion. 

Polynucleotides are inserted into vector backbones using methods known in the 

25 art. For example, insert and vector DNA can be contacted, under suitable conditions, 
with a restriction enzyme to create complementary or blunt ends on each molecule that 
can pair with each other and be joined with a ligase. Alternatively, synthetic nucleic 
acid linkers can be ligated to the termini of a polynucleotide. These synthetic linkers 
can contain nucleic acid sequences that correspond to a particular restriction site in the 
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vector DN A- Other means are known and, in view of the teachings herein, can be 
used. 

The vector backbone may comprise components functional in more than one 
selected organism in order to provide a shuttle vector, for example, a bacterial origin 
5 of replication and a eucaryo tic promoter. Alternately, the vector backbone may 
comprise an integrating vector, i.e., a vector that is used for random or site-directed 
integration into a target genome. 

The final constructs can be used immediately (e.g., for introduction into ES 
cells or for liver-push assays), or stored frozen (e.g., at -20 o Q until use. In some 
10 embodiments, the constructs are linearized prior to use, for example by digestion with 
suitable restriction endonucleases. 

Hie vectors are useful as reporters both in vitro and in vivo. The expression 
cassettes of the present invention may, for example, be introduced into a selected cell 
type and evaluated in culture. Further, non-invasive imaging and/or detecting of light- 
IS emitting conjugates in mammalian subjects was described in U.S. Patent No. 

5,650,135, by Contag, et aL, issued 22 July 1997. Substrates of hiciferase are typically 
applied to the cell or system (e.g., injection into a transgenic mouse, having cells 
carrying a hiciferase construct, of a suitable substrate for the hiciferase, for example, 
hiciferin). 

20 Transgenic organisms can also be produced using the sequences described 

herein. Constructs containing the hiciferase genes are, for example, introduced into a 
phir^otent cell (e.g., ES cell, Robertson, E. I, In: Current Communications in 
Molecular Biology, Capecchi, M. R. (ed.), Cold Spring Harbor Press, Cold Spring 
Harbor, N.Y. (1989), pp. 39-44) by any suitable method, for example, micro-injection, 

25 calcium phosphate transformation, or electroporation (see below). After suitable ES 
cells containing the construct in the proper location have been identified, the cells can 
be inserted into an embryo, preferably a blastocyst, for example as set forth by, e.g., 
Bradley et aL, (1992) Biotechnology, 10:534-539. 

The expression cassettes of the present invention may be introduced into the 

30 genome of an animal in order to produce transgenic, non-human animals for purposes 
of practicing the methods of the present invention. In a preferred embodiment of the 
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present invention, the transgenic non-human, animal may be a rodent (e.g., rodents, 
including, but not limited to, mice, rats, hamsters, gerbils, and guinea pigs). When a 
light-generating protein is used as a reporter, imaging is typically carried out using an 
intact, living, non-human transgenic animal, for example, a living, transgenic rodent 
5 (e.g., a mouse or rat). A variety of transformation techniques are well known in the art. 
Those methods include, but are not limited to, the following. 

(i) Direct microinjection into nuclei* Expression cassettes can be microinjected 
directly into animal cell nuclei using micropipettes to mechanically transfer the 
recombinant DNA This method has the advantage of not exposing the DNA to 

10 cellular compartments other than the nucleus and of yielding stable recombinants at 
high frequency. See, Capecchi, ML, Cell 22:479-488 (1980). 

For example, the expression cassettes of the present invention may be 
microinjected into the early male pronucleus of a zygote as early as possible after the 
formation of the male pronucleus membrane, and prior to its being processed by the 

15 zygote female pronucleus. Thus, microinjection according to this method should be 
undertaken when the male and female pronuclei are well separated and both are 
located close to the cell membrane. See, e.g., U.S. Patent No. 4,873,191 to Wagner, 
et aL (issued October 10, 1989); and Richa, J., (2001) "Production of Transgenic 
Mice," Molecular Biotechnology, March 2001 vol 17:261-8. 

20 (ii) ES Cell Transfection: The DNA containing the expression cassettes of the 

present invention can also be introduced into embryonic stem ("ES") cells. ES cell 
clones which undergo homologous recombination with a targeting vector are 
identified, and ES cell-mouse chimeras are then produced. Homozygous animals are 
produced by mating of hemizygous chimera animals. Procedures are described in, e.g., 

25 Roller, B.H. and Smithies, O., (1992) "Altering genes in animals by gene targeting", 
Annual review of immunology 10:705-30. 

(iii) Electroporation: The DNA containing the expression cassettes of the 
present invention can also be introduced into the animal cells by electroporation. In this 
technique, animal cells are electroporated in the presence of DNA containing the 

30 expression cassette. Electrical impulses of high field strength reversibly permeabilize 
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biomembranes allowing the introduction of the DNA. The pores created during 
electroporation permit the uptake of macromolecules such as DNA. Procedures are 
described in, e.g., Potter, H., et aL, Proc. Natl Acad Sci. U.S.A. 81:7161-7165 
(1984); and Sambrook, ch. 16. 
5 (iv) Calcium phosphate precipitation: The expression cassettes may also be 

transferred into cells by other methods of direct uptake, for example, using calcium 
phosphate. See, e.g., Graham, R, and A. Van der Eb, Virology 52:456-467 (1973); 
and Sambrook, ch.16. 

(v) Liposomes: Encapsulation of DNA within artificial membrane vesicles 

10 (liposomes) followed by fusion of the liposomes with the target cell membrane can also 
be used to introduce DNA into animal cells. See Mannino, R. and S. Gould-Fogerite, 
BioTechniques, 6:682 (1988). 

(vi) Viral capsids: Viruses and empty viral capsids can also be used to 
incorporate DNA and transfer the DNA to animal cells. For example, DNA can be 

15 incorporated into empty polyoma viral capsids and then delivered to polyoma- 
susceptible cells. See, e.g., Slilaty, S. and H. Aposhian, Science 220:725 (1983). 

(vii) Transfection using polyhrene or DEAE-dextran: These techniques are 
described in Sambrook, ch.16. 

(viii) Protoplast fusion: Protoplast fusion typically involves the fusion of 
20 bacterial protoplasts carrying high numbers of a plasmid of interest with cultured 

animal cells, usually mediated by treatment with polyethylene glycol Rassoulzadegan, 
M., et aL, Nature, 295:257 (1982). 

(ix) Ballistic penetration: Another method of introduction of nucleic acid 
segments is high velocity ballistic penetration by small particles with the nucleic acid 

25 either within the matrix of small beads or particles, or on the surface, Klein, et al., 

Nature, 327, 70-73, 1987. 

Any technique that can be used to introduce DNA into the animal cells of 

choice can be employed (e.g., 'Transgenic Animal Technology: A Laboratory 

Handbook," by Carl A. Pinkert, (Editor) First Edition, Academic Press; ISBN: 
30 0125571658; 'Manipulating the Mouse Embryo : A Laboratory Manual," Brigid 

Hogan, et aL, ISBN: 0879693843, Publisher: Cold Spring Harbor Laboratory Press, 
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Pub. Date: September 1999, Second Edition.). Electroporation has the advantage of 
ease and has been found to be broadly applicable, but a substantial fraction of the 
targeted cells may be killed during electroporation. Therefore, for sensitive cells or 
cells which are only obtainable in small numbers, microinjection directly into nuclei 

5 may be preferable. Also, where a high efficiency of DNA incorporation is especially 
important, such as transformation without the use of a selectable marker (as discussed 
above), direct microinjection into nuclei is an advantageous method because typically 
5-25% of targeted cells will have stably incorporated the microinjected DNA. 
Retroviral vectors are also highly efficient but in some cases they ace subject to other 

10 shortcomings, as described by EHis, J., and A. Bernstein, Molec. CeE BioL 9: 1621- 
1627 (1989). Where lower efficiency techniques are used, such as electroporation, 
calcium phosphate precipitation or liposome fusion, it is preferable to have a selectable 
marker in the expression cassette so that stable transformants can be readily selected, 
as discussed above. 

15 In some situations, introduction of the heterologous DNA will itself result in a 

selectable phenotype, in which case the targeted cells can be screened directly for 
homologous recombination. For example, disrupting the gene HPRT results in 
resistance to 6-thioguanine. In many cases, however, the transformation will not result 
in such an easily selectable phenotype and, if a low efficiency transformation technique 

20 such as calcium phosphate precipitation is being used, it is preferable to include in the 
expression cassette a selectable marker such that the stable integration of the 
expression cassette in the genome will lead to a selectable phenotype. For example, if 
the introduced DNA contains a neo gene, then selection for integrants can be achieved 
by selecting cells able to grow on G418. 

25 Transgenic animals prepared as above are useful for practicing the methods of 

the present invention. Operably linking a promoter of interest to a reporter sequence 
enables persons of skill in the art to monitor a wide variety of biological processes 
involving expression of the gene from which the promoter is derived. The transgenic 
animals of the present invention that comprise the expression cassettes of the present 
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invention provide a means for skilled artisans to observe those processes as they occur 
in vivo, as well as to elucidate the mechanisms underlying those processes. 

Hie monitoring of hiciferase reporter expression cassettes using non-invasive 
whole animal imaging has been described (Contag, C. et al, U.S. Patent No. 5,650,135, 

5 July 22, 1997; Contag, P., et al, Nature Medicine 4(2):245-247, 1998; Contag, C, et 
al, OSA TOPS on Biomedical Optical Spectroscopy and Diagnostics 3:220-224, 1996; 
Contag, C.H., et al, Photochemistry and Photobiplogy 66(4):523-531, 1997; Contag, 
C.H., et al, Molecular Microbiology 18(4):593-603, 1995). Such imaging typically 
uses at least one photo detector device element, for example, a charge-coupled device 

10 (CCD) camera. 

Accordingly, the amount of light produced by a red hiciferase encoded by a 
polynucleotide disclosed herein (e.g., in a cell transformed with a polynucleotide of 
the present invention or in a transgenic animal comprising cells expressing a red 
hiciferase encoded by the polynucleotides of the present invention) can be quantified 

15 using either an intensified photon-counting camera or a cooled integrating camera. 
With respect to the cooled integrating type of camera, the particular instrument can, 
for example, be selected from the following three makes/models: (1) Princeton 
Instruments Model LN/CCD 1340-1300-EB/l; (2) Roper model LN-1300EB cooled 
CCD camera (available from Roper Scientific, Inc., Tucson, Arizona); and (3) 

20 Spectral Instruments model 600 cooled CCD camera (available from Spectral 
Instruments, Inc., Tucson, Arizona). A preferred apparatus is the Princeton 
Instruments camera number XEN-5, located at Xenogen Corporation, Alameda, 
California. This camera uses a charge-coupled device array (CCD array), to generate 
a signal proportional to the number of photons per selected unit area. The selected 

25 unit area may be as small as that detected by a single CCD pixel, or, if binning is used, 
that detected by any selected group of pixels. This signal may optionally be routed 
through an image processor, and is then transmitted to a computer (either a PC 
running Windows NT (Dell Computer Corporation; Microsoft Corporation, 
Redmond, WA) or a Macintosh (Apple Computer, Cupertino, CA) running an image- 

30 processing software application, such as 'Tivinglmage" (Xenogen Corporation, 
Alameda, CA). The software and/or image processor are used to acquire an image, 
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stored as a computer data file. The data generally take the form of (x, y, z) values, 
where x and y represent the spatial coordinates of the point or area from which the 
signal was collected, and z represents the amount of signal at that point or area, 
expressed as "Relative light Units (RLUs). 
5 To facilitate interpretation, the data are typically displayed as a "pseudocolor" 

image, where a color spectrum is used to denote the z value (amount of signal) at a 
particular point Further, the pseudocolor signal image is typically superimposed over 
a reflected light or "photographic" image to provide a frame of reference. 

It will be appreciated that if the signal is acquired on a camera that has been 

10 calibrated using a stable photo-emission standard (available from, e.g., Xenogen 

Corporation), the RLU signal values from any camera can be compared to the RLUs 
from any other camera that has been calibrated using the same photo-emission 
standard Further, after calibrating the photo-emission standard for an absolute photon 
flux (photons emitted from a unit area in a unit of time), one of skill in the art can 

15 convert the RLU values from any such camera to photon flux values, which then 

allows for the estimation of the number of photons emitted per unit time, for example, 
by a cell transformed with a RR hiciferase polynucleotide of the present invention. 

The above-described cameras can be used to monitor light production mediated 
by the light-generating protein (e.g., a native and/or modified, optimized Red Railroad 

20 Worm red hiciferase of the present invention) for both in vitro and in vivo applications. 

The following examples are intended only to illustrate the present invention and 
should in no way be construed as limiting the subject invention. 

Experimental 
25 Example 1 

Modification of Phrixothrix Lucif erase 
Modification of a native railroad worm red lucif erase-encoding sequence 
(GENBANK Accession No. AF139645) to a first optimized sequence (RRLUCX) 
was performed following the guidance of the present specification. The modified, 
30 optimized polynucleotide sequence was synthesized by Integrated DNA Technologies 
(Coralville, Iowa). The resulting optimized sequence did not produce light The 
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original native sequence was checked relative to the luciferase sequence in the clone 
(Ph^, described in Viviani, V.R., et al., Biochemistry 38:8271-8279, 1999) from 
which the original sequence was derived. The original clone (Ph^) was 
independently sequenced and several sequence errors were discovered relative to the 
5 AF139645 sequence. The correct sequence of the original clone is presented in the 
top line of Figure 2 (SEQ ID NO:3) and in Figure 3 (SEQ ID NO:3, polypeptide SEQ 
IDNO:4). 

The first optimized sequence RRLUCX was then modified, based on the 
information obtained in the independent sequence of the native isolate in order to 

10 obtain a light-generating polypeptide. Modification of the RRLUCX sequence was 
performed following the guidance of the present specification and using a 
QuikChange™ kit (Stratagene, La JoBa, CA) and following the manufacturer's 
instructions for the kit. 

Table 2 (above) is a summary of the nucleic acid modifications made to the 

IS RRLUCX sequence in order to obtain the optimized, modified Red Railroad Worm 
luciferase sequence (labeled 4t RRLUCXC" in Table 2, and "RRW red LUC optimized" 
in Figure 2). The nucleotide (SEQ ID NO: 1) and protein (SEQ ID NO:2) sequences 
of the RRW red LUC optimized sequence are presented in Figure 1. Figure 2 presents 
a nucleotide sequence comparison between the native Red Railroad Worm luciferase 

20 (SEQ ID NO:3) and the RRW red LUC optimized sequence (SEQ ID NO: 1). 

Example 2 

Expression of Modified RR Luciferase in Host CeOs 

Plasmids expressing the modified luciferase polynucleotides are introduced into 
25 mammalian host cells to determine relative luciferase activities present in their prepared 
cell extracts. Plasmid DNAs are delivered into cultured mammalian cells using a 
modified calcium phosphate-mediated transfection procedure, as described for example 
in Ausubel et aL supra. Post-transfection cells are harvested and lysed. Luciferase 
activity of celllysates are determined and quantified by methods known in the art, for 
30 example using the Luciferase Assay System (Promega, Madison, WI) and following 
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the manufacturer's instructions. Peroxisome-modified and/or codon optimization 
increases expression. 

Example 3 

In vivo Measurement of Modified Lucif erases in Cells 

5 

Expression of hiciferase may also be measured from living cells by adding the 
substrate hiciferin to the growth medium. A variety of types of cells may be employed, 
for example, eucaryotic cells (e.g., insect, animal, mammalian, plant or fungal cells) or 
procaryotic cells (e.g., bacterial cells). Luminescence is thus emitted from the cells 

10 without disrupting their physiology. 

In vivo expression of the hiciferase reporter gene by cells can be determined, 
for example, by evaluating light production, mediated by the hiciferase polypeptide, 
using a Princeton Instruments Model LN/CCD 1340-1300-EB/l CCD camera. The 
cells, for example, may be grown in solution in microtiter plates and light production 

15 from each well of the microtiter plate evaluated using the CCD camera. Alternately, 
cells that grow on solid media may be imaged on the solid media in the presence of 
hiciferin substrate. For example, bacteria or fungal cells expressing the modified, 
optimized hiciferase sequence of the present invention, may be streak onto solid media 
plates and light production evaluated for patches and/or single colonies. 

20 For example, bacterial cells were transformed with a plasmid having aa 

expression cassette comprising the sequence presented as SEQ ID NO: 1. Transfected 
cells were selected. The transfected cells were streaked onto a plate of solid growth 
media. Light-output was measured from the plate using a Jobin Yvon-Spex Liquid 
Nitrogen Cooled Spectrophotometer (320 triple image axial direct drive system; Jobin 

25 Yvon Horiba, Edison, NJ). The RRLUCXC polynucleotide sequence (SEQ ID NO: 1) 
was seen to be completely functional when expressed in the host cells and produced a 
light of Xmax approximately 622 ma 

As is apparent to one of skill in the art, various modification and variations of 
the above embodiments can be made without departing from the spirit and scope of 

30 this invention. These modifications and variations are within the scope of this 
invention. 
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What is claimed is: 

1. An isolated polynucleotide, comprising a first polynucleotide having at least 
about 85% sequence identity to SEQ ID NO:l t wherein said first polynucleotide 

5 • encodes a polypeptide capable of mediating light-production. 

2. The polynucleotide of claim 1, wherein said first polynucleotide has at least 
about 90% sequence identity to SEQ ID NO:l. 

10 3. The polynucleotide of claim 2, wherein said first polynucleotide has at least 

about 95% sequence identity to SEQ ID NO: 1 . 

4. The polynucleotide of claim 3, wherein said first polynucleotide has at least 
about 98% sequence identity to SEQ ID NO:l. 

15 

5. The polynucleotide of claim 4, wherein said first polynucleotide consists of 
the sequence presented as SEQ ID NO:l. 

6. An expression cassette comprising the isolated polynucleotide of any of 
20 claims 1-5. 

7. A cell comprising an expression cassette of claim 6. 

8. A non-human, transgenic animal, comprising an expression cassette of claim 

25 6. 
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FIGURE 1 (sheet 1 of 4) 

atg gaa gaa gaa aac gtg gtg aat gga gat egg cct agg gat ctg gtg 48 
Met Glu Glu Glu Asn Val Val Asn Gly Asp Arg Pro Arg Asp Leu Val 

15 10 15 

ttt ccc ggc aca gca gga etc cag ctg tac cag tea ctg tat aag tat 96 
Phe Pro Gly Thr Ala Gly Leu Gin Leu Tyr Gin Ser Leu Tyr Lys Tyr 

20 25 30 

tea tac ate act gac ggg ata ate gac gee cat acc aac gag gtc ate 144 
Ser Tyr lie Thr Asp Gly lie lie Asp Ala His Thr Asn Glu Val lie 

35 40 45 

tea tat get cag ate ttt gaa acc tec tgc egg ctg gca gtg tea ctg 192 
Ser Tyr Ala Gin He Phe Glu Thr Ser Cys Arg Leu Ala Val Ser Leu 

50 55 60 

gag aag tat ggc ctg gat cac aac aat gtg gtg gee ate tgt tct gaa 240 
Glu Lys Tyr Gly Leu Asp His Asn Asn Val Val Ala He Cys Ser Glu 
65 70 75 80 

aac aac ata cac ttt ttc ggc ccc ctg att get gee ctg tac caa ggc 288 
Asn Asn He His Phe Phe Gly Pro Leu He Ala Ala Leu Tyr Gin Gly 

85 90 95 

ate cca atg gca aca tea aac gac atg tac aca gag agg gag atg ata 336 
He Pro Met Ala Thr Ser Asn Asp Met Tyr Thr Glu Arg Glu Met He 

100 105 110 

ggc cat ctg aac ate tec aag cca tgc ctg atg ttc tgt tea aag aaa 384 
Gly His Leu Asn He Ser Lys Pro Cys Leu Met Phe Cys Ser Lys Lys 

115 120 125 

tea ctg ccc ttc att ctg aag gtg cag aag cac ctg gac ttt ctg aaa 432 
Ser Leu Pro Phe He Leu Lys Val Gin Lys His Leu Asp Phe Leu Lys 
130 135 140 
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FIGURE 1 (sheet 2 of 4) 

aaa gtc ata gtc att gat tec atg tac gat ate aat ggc gtg gag tgc 480 
Lys Val He Val He Asp Ser Met Tyr Asp He Asn Gly Val Glu Cys 
145 150 155 160 

gtc ttc tec ttt gtc teg agg tac act gat cac gec ttc gac cca gtg 528 
Val Phe Ser Phe Val Ser Arg Tyr Thr Asp His Ala Phe Asp Pro Val 

165 170 175 

aag ttc aac ccc aaa gag ttc gac ccc etc gaa aga ace gec ctg att 576 
Lys Phe Asn Pro Lys Glu Phe Asp Pro Leu Glu Arg Thr Ala Leu He 

180 185 190 

atg aca tea tct ggg aca act gga ctg cct aag ggg gtc gtg ate tec 624 
Met Thr Ser Ser Gly Thr Thr Gl£ Leu Pro Lys Gly Val Val He Ser 

195 200 205 

cac aga tct ata act ate aga ttc gtc cat tct tec gat ccc ate tac 672 
His Arg Ser He Thr He Arg Phe Val His Ser Ser Asp Pro He Tyr 

210 215 220 

ggc ace agg att gec cca gac aca tea att ctg get ate gca ccc ttc 720 
Gly Thr Arg He Ala Pro Asp Thr Ser He Leu Ala He Ala Pro Phe 
225 230 235 240 

cat cac gee ttt gga ctg ttt act gca ctg get tac ttc cct gtc gga 768 
His His Ala Phe Gly Leu Phe Thr Ala Leu Ala Tyr Phe Pro Val Gly 

245 250 255 

ctg aag att gtc atg gtg aag aaa ttt gag ggc gag ttc ttt ctg aaa 816 
Leu Lys He Val Met Val Lys Lys Phe Glu Gly Glu Phe Phe Leu Lys 

260 265 270 

ace ata caa aat tac aag ate get tct att gtc gtg cct cct cct att 864 
Thr He Gin Asn Tyr Lys He Ala Ser He Val Val Pro Pro Pro He 
275 280 285 
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FIGURE 1 (sheet 3 of 4) 

atg gtc tat ctg get aag tec ccc ctg gtc gat gaa tac aat tta tct 912 
Met Val Tyr Leu Ala Lys Ser Pro Leu Val Asp Glu Tyr Asn Leu Ser 

290 295 300 

tct ctg acc gaa ate gca tgc gga ggc tct cct ctg ggg aga gac ate 960 
Ser Leu Thr Glu lie Ala Cys Gly Gly Ser Pro Leu Gly Arg Asp lie 
305 310 315 320 

gca gat aaa gtc gec aag aga ctg aaa gtg cat gga ate etc cag gga 1008 
Ala Asp Lys Val Ala Lys Arg Leu Lys Val His Gly lie Leu Gin Gly 

325 330 335 

tat ggg ctg acc gag acc tgt tec get ctg ata ctg tct ccc aac gat 1056 
Tyr Gly Leu Thr Glu Thr Cys Ser Ala Leu lie Leu Ser Pro Asn Asp 

340 345 350 

egg gaa ctg aaa aag ggg gca ate gga acc cct atg cca tac gtg caa 1104 
Arg Glu Leu Lys Lys Gly Ala lie Gly Thr Pro Met Pro Tyr Val Gin 

355 360 365 

gtg aaa gtg ate gac ate aat acc ggg aag gec ctg gga cca aga gag 1152 
Val Lys Val lie Asp lie Asn Thr Gly Lys Ala Leu Gly Pro Arg Glu 

370 375 380 

aaa ggc gag ate tgc ttc aag tct cag atg ctg atg aag ggg tat cac 1200 
Lys Gly Glu lie Cys Phe Lys Ser Gin Met Leu Met Lys Gly Tyr His 
385 390 395 400 

aac aat cct cag gec act agg gat get ctg gac aag gat ggg tgg ctg 1248 
Asn Asn Pro Gin Ala Thr Arg Asp Ala Leu Asp Lys Asp Gly Trp Leu 

405 410 415 

cac act ggg gac ctg gga tat tac gac gaa gac aga ttt ate tat gtc 1296 
His Thr Gly Asp Leu Gly Tyr Tyr Asp Glu Asp Arg Phe lie Tyr Val 

420 425 430 
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FIGURE 1 (sheet 4 of 4) 

gtg gac agg ctg aaa gag ctg ate aag tat aaa ggg tat cag gtc gec 1344 
Val Asp Arg Leu Lys Glu Leu He Lys Tyr Lys Gly Tyr Gin Val Ala 

435 440 445 

cct get gag ttg gaa aac ctg ctg ttg cag cac ccc aat ate tct gat 1392 
Pro Ala Glu Leu Glu Asn Leu Leu Leu Gin His Pro Asn He Ser Asp 

450 455 460 

gec ggc gtg att gga att ccg gac gaa ttt get ggt caa tta cct tec 1440 
Ala Gly Val He Gly He Pro Asp Glu Phe Ala Gly Gin Leu Pro Ser 
465 470 475 480 

gec tgt gtg gtg ctg gag cct ggc aag aca atg acc gag aaa gaa gtg 1488 
Ala Cys Val Val Leu Glu Pro Gly Lys Thr Met Thr Glu Lys Glu Val 

485 490 495 

cag gac tac att gca gag ctg gtc act aca act aaa cat ctg agg ggg 1536 
Gin Asp Tyr He Ala Glu Leu Val Thr Thr Thr Lys His Leu Arg Gly 
500 505 510 

ggg gtc gtc ttt ata gat tec att cca aag ggc cca aca ggg aaa ctg 1584 
Gly Val Val Phe He Asp Ser He Pro Lys Gly Pro Thr Gly Lys Leu 

515 520 525 

atg aga aac gaa ctg agg gca ate ttt get egg gaa cag gca aaa ate 1632 
Met Arg Asn Glu Leu Arg Ala He Phe Ala Arg Glu Gin Ala Lys He 

530 535 540 

get gtg taa 1641 
Ala Val 
545 
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Figure 3 (sheet 1 of 4) 



atg gaa gaa gaa aac gtt gtg aat gga gat cgt cct cgt gat eta gtt 48 
Met Glu Glu Glu Asn Val Val Asn Gly Asp Arg Pro Arg Asp Leu Val 

1 5 10 15 

ttt cct ggc aca gca gga eta caa tta tat caa tea tta tat aaa tat 96 
Phe Pro Gly Thr Ala Gly Leu Gin Leu Tyr Gin Ser Leu Tyr Lys Tyr 

20 25 30 

tea tat att act gac gga ata ate gat gee cat acc aat gaa gta ata 144 
Ser Tyr lie Thr Asp Gly lie lie Asp Ala His Thr Asn Glu Val lie 

35 40 45 

tea tat get caa ata ttt gaa acc age tgc cgc ttg gca gtt agt eta 192 
Ser Tyr Ala Gin He Phe Glu Thr Ser Cys Arg Leu Ala Val Ser Leu 

50 55 60 

gaa aaa tat ggc ttg gat cat aac aat gtt gtg gca ata tgc agt gaa 240 
Glu Lys Tyr Gly Leu Asp His Asn Asn Val Val Ala He Cys Ser Glu 
65 70 75 80 

aac aac ata cac ttt ttt ggc cct tta att get get tta tac caa gga 288 
Asn Asn He His Phe Phe Gly Pro Leu He Ala Ala Leu Tyr Gin Gly 

85 90 95 

ata cca atg gca aca tea aat gat atg tac aca gaa agg gag atg att 336 
He Pro Met Ala Thr Ser Asn Asp Met Tyr Thr Glu Arg Glu Met He 

100 105 110 

ggc cat ttg aat ata teg aaa cca tgc ctt atg ttt tgt tea aag aaa 384 
Gly His Leu Asn He Ser Lys Pro Cys Leu Met Phe Cys Ser Lys Lys 

115 120 125 

tea etc cca ttt att ctg aaa gta caa aaa cat eta gat ttc ctt aaa 432 
Ser Leu Pro Phe He Leu Lys Val Gin Lys His Leu Asp Phe Leu Lys 
130 135 140 
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Figure 3 (sheet 2 of 4) 

aaa gtc ata gtc att gat agt atg tac gat ate aat ggc gtt gaa tgc 480 
Lys Val lie Val He Asp Ser Met Tyr Asp He Asn Gly Val Glu Cys 
145 150 155 160 

gta ttt age ttt gtt tea cgt tat act gat cac gee ttt gat cca gtg 528 
Val Phe Ser Phe Val Ser Arg Tyr Thr Asp His Ala Phe Asp Pro Val 

165 170 175 

aaa ttt aac cca aaa gag ttt gat ccc ttg gaa aga ace gca tta att 576 
Lys Phe Asn Pro Lys Glu Phe Asp Pro Leu Glu Arg Thr Ala Leu He 

180 185 190 

atg aca tea tct gga aca act gga ttg cct aaa ggg gta gta ata age 624 
Met Thr Ser Ser Gly Thr Thr Gly Leu Pro Lys Gly Val Val He Ser 

195 200 205 

cat aga agt ata act ata aga ttc gtc cat age agt gat ccc ate tat 672 
His Arg Ser He Thr He Arg Phe Val His Ser Ser Asp Pro He Tyr 

210 215 220 

ggt act cgt att get cca gat aca tea att ctt get ata gca ccg ttc 720 
Gly Thr Arg He Ala Pro Asp Thr Ser He Leu Ala He Ala Pro Phe 
225 230 235 240 

cat cat gec ttt gga ctg ttt act gca eta get tac ttt cca gta gga 768 
His His Ala Phe Gly Leu Phe Thr Ala Leu Ala Tyr Phe Pro Val Gly 

245 250 255 

ctt aag att gta atg gtg aag aaa ttt gag ggc gaa ttc ttc tta aaa 816 
Leu Lys He Val Met Val Lys Lys Phe Glu Gly Glu Phe Phe Leu Lys 

260 265 270 

ace ata caa aat tac aaa ate get tct att gta gtt cct cct cca att 864 
Thr He Gin Asn Tyr Lys He Ala Ser He Val Val Pro Pro Pro He 

275 280 285 
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Figure 3 (sheet 3 of 4) 

atg gta tat ttg get aaa agt cca tta gtc gat gaa tac aat tta teg 912 
Met Val Tyr Leu Ala Lys Ser Pro Leu Val Asp Glu Tyr Asn Leu Ser 

290 295 300 

age tta acg gaa att get tgt gga ggg tct cct tta gga aga gat ate 960 
Ser Leu Thr Glu lie Ala Cys Gly Gly Ser Pro Leu Gly Arg Asp lie 
305 310 315 320 

gca gat aaa gta gea aag aga ttg aaa gta cat gga ate eta caa gga 1008 
Ala Asp Lys Val Ala Lys Arg Leu Lys Val His Gly lie Leu Gin Gly 

325 330 335 

tat gga tta ace gaa ace tgc age get eta ata ctt age ccc aat gat 1056 
Tyr Gly Leu Thr Glu Thr Cys Ser Ala Leu lie Leu Ser Pro Asn Asp 

340 345 350 

cga gaa ctt aaa aaa ggt gca att gga acg cct atg cca tat gtt caa 1104 
Arg Glu Leu Lys Lys Gly Ala He Gly Thr Pro Met Pro Tyr Val Gin 

355 360 365 

gtt aaa gtt ata gat ate aat act ggg aag gcg eta gga cca aga gaa 1152 
Val Lys Val He Asp He Asn Thr Gly Lys Ala Leu Gly Pro Arg Glu 

370 375 380 

aaa ggc gaa ata tgc ttc aaa agt caa atg ctt atg aaa gga tat cac 1200 
Lys Gly Glu He Cys Phe Lys Ser Gin Met Leu Met Lys Gly Tyr His 
385 390 395 400 

aac aat ccg caa gca act cgt gat get ctt gac aaa gat ggt tgg ctt 1248 
Asn Asn Pro Gin Ala Thr Arg Asp Ala Leu Asp Lys Asp Gly Trp Leu 

405 410 415 

cat act ggg gat ctt gga tat tac gac gaa gac aga ttt ate tat gta 1296 
His Thr Gly Asp Leu Gly Tyr Tyr Asp Glu Asp Arg Phe He Tyr Val 

420 425 430 



9/10 



WO 03/016839 



PCT/US02/26170 



Figure 3 (sheet 4 of 4) 

gtt gat cga ttg aaa gaa ctt att aaa tat aaa gga tat cag gtt gcg 1344 
Val Asp Arg Leu Lys Glu Leu lie Lys Tyr Lys Gly Tyr Gin Val Ala 

435 440 445 

cct get gaa ctg gaa aat ctg ctt tta caa cat cca aat att tct gat 1392 
Pro Ala Glu Leu Glu Asn Leu Leu Leu Gin His Pro Asn lie Ser Asp 

450 455 460 

gcg ggt gtt att gga att ccg gac gaa ttt get ggt caa tta cct tec 1440 
Ala Gly Val lie Gly lie Pro Asp Glu Phe Ala Gly Gin Leu Pro Ser 
465 470 475 480 

gcg tgt gtt gtg tta gag cct ggt aag aca atg acc gaa aag gaa gtt 1488 
Ala Cys Val Val Leu Glu Pro Gly Lys Thr Met Thr Glu Lys Glu Val 

485 490 495 

cag gat tat att gca gag eta gtc act aca act aaa cat ctt cga ggc 1536 
Gin Asp Tyr He Ala Glu Leu Val Thr Thr Thr Lys His Leu Arg Gly 

500 505 510 

ggt gtc gta ttt ata gat agt att cca aaa ggc cca aca gga aaa etc 1584 
Gly Val Val Phe He Asp Ser He Pro Lys Gly Pro Thr Gly Lys Leu 

515 520 525 

atg aga aac gaa etc cgt gca ata ttt gec egg gaa cag gca aaa tea 1632 
Met Arg Asn Glu Leu Arg Ala He Phe Ala Arg Glu Gin Ala Lys Ser 

530 535 540 

aaa tta taa 1641 
Lys Leu 
545 
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